Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning
Abstract
Learning contact-rich manipulation skills is essential to robotic applications. Such skills require the robots to interact with the environment with feasible manipulation trajectories and suitable compliance control parameters to enable safe and stable contact. However, learning these skills is challenging due to data inefficiency in the real world and the sim-to-real gap in simulation. In this paper, we introduce a hybrid offline-online framework to learn robust manipulation skills. We employ model-free reinforcement learning for the offline phase to obtain the robot motion and compliance control parameters in simulation. Subsequently, in the online phase, we learn the residual of the compliance control parameters to maximize robot performance-related criteria with force sensor measurements in real time. To demonstrate the effectiveness and robustness of our approach, we provide comparative results against existing methods for assembly and pivoting tasks.
Framework Overview
We propose a framework to learn robot manipulation skills that can transfer to the real world. The framework contains two phases: skill learning in simulation and admittance adaptation on the real robot. We use model-free RL to learn the robot's motion with domain randomization to enhance the robustness for direct transfer. The compliance control parameters are learned at the same time and serve as an initialization to online admittance learning. During online execution, we iteratively learn the residual of the admittance control parameters by optimizing the future robot trajectory smoothness and task completion criterion. We conduct real-world experiments on two typical contact-rich manipulation tasks: assembly and pivoting. Our proposed framework achieves efficient transfer from simulation to the real world. Furthermore, it shows excellent generalization ability in tasks with different kinematic or dynamic properties.
Learned Polices in Simulation
Assembly
Learned Policy in Simulation. Success rate: 100%
Pivoting
Learned Policy in Simulation. Success rate: 100%
Real World Experiments: Sim-to-real Transfer
Direct Transfer
Assembly Square Peg. Success Rate: 3/10
Manual Tune
Assembly Square Peg. Success Rate: 10/10
Proposed
Assembly Square Peg. Success Rate: 10/10
Direct Transfer
Pivoting Wood Square. Success Rate: 0/10
Manual Tune
Pivoting Wood Square. Success Rate: 10/10
Proposed
Pivoting Wood Square. Success Rate: 9/10
Real World Experiments: Assembly Tasks GeneralizationÂ
Direct Transfer
Assembly Triangle Peg. Â Success Rate: 0/10
Manual Tune
Assembly Triangle Peg. Success Rate: 8/10
Proposed
Assembly Triangle Peg. Success Rate: 10/10
Direct Transfer
Assembly Pentagon Peg. Â Success Rate: 1/10
Manual Tune
Assembly Pentagon Peg. Success Rate: 9/10
Proposed
Assembly Pentagon Peg. Success Rate: 10/10
Direct Transfer
Assembly Ethernet Connector. Â Success Rate: 0/10
Manual Tune
Assembly Ethernet Connector. Success Rate: 1/10
Proposed
Assembly Ethernet Connector. Success Rate: 9/10
Direct Transfer
Assembly Waterproof Connector. Â Success Rate: 0/10
Manual Tune
Assembly Waterproof Connector. Success Rate: 0/10
Proposed
Assembly Waterproof Connector. Success Rate: 9/10
Real World Experiments: Pivoting Tasks GeneralizationÂ
Direct Transfer
Pivoting Adapter. Â Success Rate: 0/10
Manual Tune
Pivoting Adapter. Â Success Rate: 0/10
Proposed
Pivoting Adapter. Success Rate: 8/10
Direct Transfer
Pivoting Eraser. Â Success Rate: 0/10
Manual Tune
Pivoting Eraser. Success Rate: 10/10
Proposed
Pivoting Eraser. Success Rate: 9/10
Direct Transfer
Pivoting Pocky Box (short). Â Success Rate: 0/10
Manual Tune
Pivoting Pocky Box (short). Success Rate: 1/10
Proposed
Pivoting Pocky Box (short). Success Rate: 8/10
Direct Transfer
Pivoting Pocky Box (long). Success Rate: 0/10
Manual Tune
Pivoting Pocky Box (long). Success Rate: 1/10
Proposed
Pivoting Pocky Box (long). Success Rate: 7/10
Ablation on Weight Parameter Selection
We study the effect of weight parameter selection in online admittance residual learning. We find smaller weights tend to prioritize trajectory smoothness, and larger weight values prioritize task completion. Â
w=0.0
Assembly Task. Â Success Rate: 8/10
w=0.2
Assembly Task. Success Rate: 9/10
w=0.4
Assembly Task. Success Rate: 10/10
w=0.6
Assembly Task. Success Rate: 10/10
w=0.8
Assembly Task. Success Rate: 10/10
w=1.0
Assembly Task. Success Rate: 10/10
w=0.0
Pivoting Task. Â Success Rate: 10/10
w=0.2
Pivoting Task. Success Rate: 9/10
w=0.4
Pivoting Task. Success Rate: 9/10
w=0.6
Pivoting Task. Success Rate: 7/10
w=0.8
Pivoting Task. Success Rate: 9/10
w=1.0
Pivoting Task. Success Rate: 6/10
Additional Experiments: Screwing
We additionally evaluate the effectiveness of the framework on a screwing task. We directly applied the learned policy for the square peg-in-hole with the proposed online admittance learning method on an M10 bolt-nut assembly task and found it can robustly align the bolt with the nut. Therefore, instead of retraining from scratch, we directly utilized this policy and added a rotation primitive for screwing. The entire process, the online admittance learning, is constantly optimized for the admittance controller. Surprisingly, we can robustly align the nut and bolt with the proposed approach and stably screw it.Â
Ablation on smaller clearances
learned policy on 1mm clearance
 testing on 0.5mm clearance
testing with proposed approach
Ablation on reward function design
original reward
distance based reward
distance reward on real robot
Ablation on model-free RL algorithms
DDPG
Direct transfer
Proposed method
TD3
Direct transfer
Proposed method
DDPG for pivoting
TD3 for pivoting
Comparison of Contact Force
Here we show the comparison of contact force. We let the robot move downwards to generate contact with the table at the same speed both in simulation and the real world. The figure on the left shows the difference in the contact force and further indicates the sim-to-real gap.
Additional Study: Comparison of External Force EstimationÂ
Here we compare the performance of using different external force estimation/modeling methods. The task is to let an admittance-controlled robot move to a table to establish contact. The objective function is purely the FITAVE objective to ensure the smoothness of the trajectory and reduce the oscillation. We found the 'record & replay' strategy performs better than using online data to fit a force model.
Force Model Fitting
With bad initialization, the robot contacts with the table and results in oscillation. Here we online fit a force model and utilize it in our online admittance learning. However, the robot cannot stablize itself.
Record & Replay
In comparison, with the same setup but using the record and replay strategy. The online admittance residual learning can efficiently reduce oscillation and establish stable contact
Additional Study: Utilizing Online Admittance Learning for Obtaining Optimal Admittance/Impedance Control Parameters
Here we demonstrate the effectiveness of purely using online admittance learning with (random) initialized admittance control parameters. We let the robot contact the table with different materials. With randomly initialized parameters, the robot will oscillate and cannot stabilize itself. With online admittance learning, the robot can instantly find the optimal control parameters and eliminate the oscillation within one step .
Random Gain
Plastic Surface. Bouncing on the surface
Online Admittance Learning
Plastic Surface. Instantly eliminate oscillations
Random Gain
Metal Surface. Bouncing on the surface
Online Admittance Learning
Metal Surface. Instantly eliminate oscillations.
Random Gain
Wood Surface. Bouncing on the surface
Online Admittance Learning
Wood Surface. Instantly eliminate oscillations.